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1. INTRODUCTION 

Classification techniques in conducting the process to find a model or function explaining 
and characterizing the concepts or data classes, for specific purposes [1]. Many techniques or methods 
applied in the classification, which one of them is the K-Nearest Neigbor (KNN) method for classifying 
objects based on learning data of which the closest distance to the object. Actually, classification means 
the attempt to predict certain case fall under specific category or class, differs with regression that focus 
on number alue a variable will have [2, 3]. Learning data is projected into a large dimension space, where 
each dimension represents a feature of the data, which is divided into sections based on instance-based 
learning or lazy learning where the function is only approximated locally and all computation is deferred 
until classification [4, 5]. Furthermore, learning in the KNN method passes through a point in space 
of a space marked from a class if it is the most commonly found classification in the nearest neighbor data 
closest to that data. The distance of the neighbors in learning the KNN method is usually calculated based 
on Euclidean distance [6]. Therefore, the regulation and policy should be considered first before 
the implementation takes place due to decision point leading to quality of the result as well as effectiveness 
and efficiency [7-9]. 

Academician conducted a research [10] applying KNN where in the learning process also apllied 
Euclidean distance in classifying recruiting prospective teachers and employees at vocationalhigh schools by 
combining Weighted Product (WP) methods with the results of several criteria weight values. It was obtained 
the value of accuracy is 94%, 80% precision, and 80% recall value. While [11] also conducted research with 
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KNN where it applied a prediction system classification for the students’ achievement, in the system also 
applied the Euclidean distance formula in the learning process with the results through the Euclidean distance 
formula applied in predicting the students’achievement scores resulted in accurate value of 82%. On the other 
hand, one research [12] applying KNN, in their learning they also applied the Euclidean distance formula 
to make new weighting. In addition, conducted [13] a classification study applying KNN and support vector 
mechane (SVM) with results of accuracy of more than 99.83%, sensitivity more than 0.995 and specificity 
of more than 0.998. 

From those studies the average research with KNN applied the Euclidean distance formula 
in the learning process. In a different method, several researchers conducted an optimization in the method 
by performing an optimization on the distance formula. Conducted [14] an optimization of the Simple 
Evolving Connectionist Systems (SeCOS) method by testing the Normalized Euclidean distance formula, 
Normalized Manhattan and Normalized Hamming. Furthermore, optimized [15] configuration of discrete 
wavelet frame (DWF) applying the texture feature extraction method in images involving manhattan 
distance, euclidean distance, normalized manhattan distance and normalized euclidean distance. Based 
on the previous research conducted by the KNN method, it was necessary to optimize the search 
for the closest distance by comparing several distance formulas. The optimization process replaces 
the euclidean distance formula with the normalized euclidean distance formula, manhattan and normalized 
Manhattan to obtain optimal calculation results. The sample data used were creditcard payment usage data 
with 30000 datasets and 23 attributes achieved from UCI Machine Learning. This study implement k -Nearest 
Neigbor for big data analytics as the action of making the best or most effective use of a situation or resource 
to help identify the optimal value that allow the process to be simplified in certain cases. 


2. METHODOLOGY 

KNN is a method of classifying objects based on learning data that is closest to the object. 
This method aims at classifying new objects based on attributes and training samples. Given a query point, 
then it will find a number of K objects or training points closest to the query point. The predicted value 
of the query will be determined based on the neighbor classification. Before performing calculations using 
the K-Nearest Neighbor method, the training and test data must firstly be determined. Then the calculation 
process will be carried out to find distances applying the Euclidean distance formula. It is a very simple 
technique and easy to implement. Similar to clustering techniques, grouping new data based on their distance 
to some of the closest data/neighbors. The similarity function will produce a value determining whether there 
are similarities between the new cases and those in the case base. To determine the similarity can be done 
with several functions, i.e. with the similarity euclidean distance function. The disadvantage of this Euclidean 
distance function 1s that if one attribute input has a relatively large range, it can defeat other attributes. 
Consequently, distance is often normalized by dividing the distance for each attnbute with 
the range (1.e. the maximum value-minimum value) of the attribute so that the values for each attribute have 
a normalized new range of 0 to 1. The (1) 1s a formula for normalizing data,1e. data is range 0 to 1. 


ete (1) 


Xmax —~*min 


where; x=value of the data 
y=value of normalisation 
Xmin=Minimum value Le. 0 
Xmax=Maximum value Le. 0 
The types of this method, if seen from its N value are as follows: 
- ]-NN, predictions aremadeon | closest labeled data. 
- Calculate the distance between new data to each labeled data. 
- Determine | labeled data that has the most minmum distance. 
- Predicting the new data into labeled data. 
After normalization, the data then calculates the proximity value. This calculation process is applied 
in finding predictions using the Euclidean distance formula. The equation of 2 formulas for calculating 
proximity between two cases is as follows. 


poet Tsp 


2; 
similarity (T,S) = (2) 


i 


where: t=new case 
s=the value of the closeness of the case in storage 
n=number of attributes in each case 
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i=individual attributes between | ton 

f=similarity attribute function of i between case T and caseS 

w=the weight given to the attribute1 

The (3) is applied on the new data calculated with all old ones. In the calculation process of them, it 

was obtained the length of time calculated by the system. However, in this study optimization 
of the KNN method was carried out by changing or replacing the euclidean distance formula with 
the hamming distance formula and the distance distance formula in order to find a more optimal value 
of closeness. The hamming distance formula is seen in 3. 


oF Wi-Wil 
On = oP it Wil ©) 
where: K=number of attributes in each case 
I=new case 
W=the value of the proximity of the casein storage 
After obtaining the results of the two distance values, then comparing with the Manhattan distance 
formula. The manhattan distance formula is seen in (4). After obtaining the value of the three distance 
formulas, the next step is to discuss them in the KNN method. 


[Zi-Wil 
Dy = Dk Hew (4) 


2.1. Data used 

The data used are Marketing Bank data obtained from UCI Machine Learning. They are those 
of prospective bank customers who would be predicted to make credit. The data consists of 41188 customers 
consisting of 6 attributes, namely work, marital status, education, owning a home loan, owning bank loans 
and agreeing to the credit initiated by the bank. The data are research data that have been done [16], 
namely research predicting bank telemarketing. 


2.2. General architecture 
The general implemented architecture of the method is illustrated in Figure 1. That can be explained 

in stages as follows: 

- The having achieved dataset is stored in the database by entering all old data. 

- Input new data to get proximity value before processing the data to be trained it is normalized to have 
a range from 0 to 1. 

- Calculate the value of the proximity of the new case with the entire old case by applying the euclidean 
distance formula. 

- Calculate the value of the proximity of the new case with the entire old case by applying the hamming 
distance formula. 

- Calculate the value of the proximity of the “new case with the entire old case by using the Manhattan 
distance formula. 

- Displays the results of the proximity value based on the three distance formulas used. 

- Give conclusions which distance formula is more optimalto use with existing case data. 
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Figure 1. General architecture 
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3. RESULTS AND DISCUSSION 
3.1. Analysis of the nearest neighbor method 

Nearest neighbor is an approach to search cases by calculating the closeness between new cases 
and old cases, which is based on matching weights of a number of existing features [5]. For example like 
finding a solution for a new customer using a solution from a previous customer. To find out which 
customers will be used, then the closeness of the case of new customers is calculated with all cases of old 
customers. Interestingly, instance-based learning has several advantages over rule based classification 
methods such as it is very robust that can outperform conventional parametric classifiers when the actual 
distribution of data is different from the assumed distribution. It also establishes the decision boundary 
automatically based on a traning set that can be incrementally refined when new training samples are added 
to the existing samples [17]. However, there are many techniques in k-Nearest Neighbor such as weighted 
KNN, condensed kNN, reduced kKNN, model based kKNN, rank kKNN, modified kKNN, pseudo/generalized NN, 
clustered kKNN, Ball Tree kNN, k-d tree, nearest feature line neighbor (NFL), local NN, tunable NN, center 
based NN, principal axis tree NN and othogonal search tree NN [18]. Mostly, those techniques focused 
on good performance, less computation time, fast search and effective for large data sets. Therefore, when 
there is littele or no prior knowledge about the distribution of the data, the KNN method should be one 
of the first choices for classification, because it is a powerful non-parametric classification system which 
bypasses the problem of probability desitiees completely [19, 20]. Accuracy of KNN is kept high in most 
of the cases but as size of dataset increases lead to the decreases, so with the time taken to calculate all 
required values for result that increases as the dataset become larger [21]. The case of old customers 
with the greatest closeness will be taken to be used in the case of new customers. From Figure 2, there are 4 
old customers, namely A, B, C, and D. When there are new customers, the solution will be taken by finding 
the distance between new customers and all old customers. With the closest distance is the solution from 
the old customer, from the figure above the old customer solution B will be used because it has the shortest 
distance. In this study we will test the proximity value with 3 distance formulas, including the Euclidean, 
hammingand manhattan distance formulas. Of the three distance formulas, the optimalvalueis achieved so 
thatthe result of changing the distance formula can optimize the KNN method with the test data are the bank 
telemarketing data. 
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Figure 2. Ilustration 
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3.2. Problem solving analysis 

Examples of cases, for example, to predict whether the new bank customers have problems or not 
based on the data. 
a. Casetable 

The following Table | is anexample of case of old customers 


Table 1. Example of case table of old customers 
Old Customers’ case 


Name Education Status Home Credit Bank Credtt Occupation Agree 


A >=Bachelor Single No No Enterpreneur no 
B <=High School Married Yes No Enterpreneur yes 
C >=Bachelor Single No Yes Privateemployees yes 
D D1-D3 Married No Yes Civil Servant no 
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b. Determine the weight of each attribute 
The following Table 2 is weight of each attribute 


Table 2. Weight of each attribute 
Attribute Weight 


Education 0.5 

Status 0.5 
Home Credit 1 
Bank Credit 1 


Occupation 0.75 


c. Next step to determine the closeness of the value in the attribute 
- The closeness of the value in the attribute. The following Table 3 is an example of of the closeness 
of the value in the attribute of education. 


Table 3. Tabel of the closeness of the value in the attribute of education 


Education Education Closeness 
<=High School <=High School ] 
Diploma Diploma ] 
>=Bachelor >=Bachelor ] 
<=High School Diploma 0.5 
Diploma <=High School 0.5 
<=High School >=Bachelor 0.4 
>=Bachelor <=High School 0.4 
Diploma >=Bachelor 0.75 
>=Bachelor Diploma 0.75 


- The Closeness of the Status Value. The following Table 4 is an example of of the closeness 
of the status value. 


Table 4. The tabelof the closeness of the status value 


Status Status Closeness 
Single Single ] 
Married Married 1 
Divorced Divorced 1 
Single Married 0.5 
s Married Single 0.5 
Single Divorced 0.4 
Divorced Single 0.4 
Married Divorced 0.75 
Divorced Married 0.75 


- The closeness of the Home Credit Value. The following Table 5 is an example of the closeness 
of the home credit value. 


Table 5. The tabelof the closeness of the home credit value 


Home credit Home credit Closeness 
Yes Yes 1 
No No 1 
Yes No 0.7 
No Yes 0.7 


- The Closeness of Bank Credit Value. The following Table 6 is an example of the closeness of bank 
credit value. 
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Table 6. The tabelof the closeness of bank credit value 
Bank Credit Bank Credit Closeness 


Yes Yes I 
No No 1 
Yes No 0.7 
No Yes 0.7 


The Closeness of Occupation Value. The following Table 7 is an example of the closeness 
of occupation value. 


Table 7. The tabelof the closeness of occupation value 


Occupation Occupation Closeness 
Private Employees Private Employees ] 
Enterpreneur Enterpreneur ] 
Civil Servant Civil Servant ] 
Private Employees Enterpreneur 0.5 
Enterpreneur Private Employees 0.5 
Private Employees Enterpreneur 0.4 
Enterpreneur Private Employees 0.4 
Civil Servant Enterpreneur 0.75 
Enterpreneur Civil Servant 0.75 


Examples of problem solving are new customers with the following attribute values: 
- Education: Diploma 

- Status: Single 

- HomeLoans: No 

- Credit Debt: No 

- Occupation: Entrepreneur 

To predict whether the customer will agree or not, the following steps are taken. 

e. Next to determine the weight of each attribute. 


3.3. Analysis by using formula of a euclidean distance 


- Calculate the closeness between the cases of new customer and case A. The following Table 8 is an 
example of the closeness of nes case and case A. 


Table 8. Examples of the table of the closeness of nes case and case A 


Attribute New Case Old Case The Closeness Value Weight of Atribute 
Education Diploma >=Bachelor 0.75 0.5 
Status Single Single ] 0.5 
Home Credit No No ] ] 
Bank Credit No No ] ] 
Occupation Enterpreneur Enterpreneur ] 0.75 


The closeness of new case and Case A 1s calculated by applying (4): 


O.75x05+07x05+ 1xitixitix075 2,653125 


=, “V7 075 
0540541414075 3,75 


Calculate the closeness between the case of new customer and case B. The following Table 9 is an 
example of new case proximity table with B. 


Table 9. Example of new case proximity table with B 


Atribute New Case Old Case The Closeness Value _ Weight of Atribute 
Education Diploma <=High School 0.5 0.5 
Status Single Married 0.5 0.5 
Home Credit No No 1 1 
Bank Credit No No 1 1 
Occupation _Enterpreneur___ Enterpreneur L 0.75 
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The closeness of the new case with case B was calculated by applying (4): 


O5x05+1x05+04x1+1¢14+1%0.75 2,26875 


05+05+1+1+0.75 3,75 
- Calculate the closeness between new customer cases and case C. The following Table 10 is an example 
of new case proximity table with C. 


=0),605 


Table 10. Example of new case proximity table with C 


Attribute New Case Old Case Value Closeness__ Weight Atribute 
Education Diploma >=Bachelor 0.75 0.5 
Status Single Single 0.7 0.5 
Home Credit No No 1 1 
Bank Credit No Yes 0.4 1 
Occupation __Enterpreneur__ Private Employees 0.4 0.75 


The closeness of new case and case C was calculated by applying the formula from (4) as follow: 


0.754054 07x05+1x414+04x1404%0.75 1,753125 
0 ee ee S”*~C*é~< SO 
0.54+0.5+1+ 140.75 3,75 


- Calculate the closeness between new customer cases and case D. The following Table 11 is an example 
of the closeness table of new case with case D. 


Table 11. Example of the closeness table of new case with case D 


Attribute New Case Old Case Value of Closeness__ Weight of Attribute 


Education Diploma Diploma 1 0.5 

Status Single Married 1 0.5 
Home Credit No No 0.5 1 
Bank Credit No Yes 0.4 1 

Occupation __Enterpreneur__ Civil Servant 0.6 0.75 


The closeness of new case and case D was calculated by applying the formula from (4) 
as follows: 





1x05t+1x05+0541+04%1406x%0.75 1,6875 


= =0,45 
05+05+1+1+075 3,75 


From the calculation of the closeness between new cases with cases A, B, C and D, it can be found that the 
greatest closeness value is obtained in case A, then the prediction in case A will be used, 1.e. new customers 


to agree or disagree in the bank offer. 


3.4. Analysis by appling hamming distance formula 
- Calculating the closeness of new customer cases and case A. The following Table 12 is an example 
of new case with case B. 


Table 12. Example of new case with case B 
Attribute New Case Old Case Value of Closeness __ Weight of Attribute 


Education Diploma >=Bachelor 0.75 0.5 

Status Single Single ] 0.5 
Home Credit No No 1 1 
Bank Credit No No 1 1 

Occupation __Enterpreneur__ Enterpreneur 1 0.75 
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The closeness of new case with case A was calculared by applying formula from (4) as follows: 


0.75%0.5 —0.7x0.5 — 1x1 -—1x1 -—1x0.75 - 2.75 


= = 0.785 
0.75x0.5 + 0.7x0.5 + Ixl+1x1+1x0.75 3.5 


- Calculating the closeness of new customer cases and case B. The following Table 13 is an example 
of new case with case B. 


Table 13. Example of new case with case B 
Attribute New Case Old Case Value of Closeness Weight of Attribute 


Education Diploma <=Bachelor 0.5 0.5 

Status Single Married 0.5 0.5 
Home Credit No No 1 1 
Bank Credit No No 1 1 

Occupation __Enterpreneur__ Enterpreneur 1 0.75 


The closeness of new case with case B was calculated by applying the formula from (4) as follows: 


0.5005 — 1r05 —O0.4x1— 1x1 — 10075 7 24 an 
0.50.5 + 1x054+04x1¢ix1¢4¢10075 29 


- Calculating the closeness of new customer cases and case C. The following Table 14 is an example 
of new case with case C. 


Table 14. Example of new case with case C 


Attribute New Case Old Case Value of Closeness __ Weight of Attribute 
Education Diploma >=Bachelor 0.75 0.5 
Status Single Single 0.7 0.5 
Home Credit No No 1 1 
Bank Credit No Yes 0.4 1 
Occupation __Enterpreneur__ Private Employees 0.4 0.75 


The closeness of new case with case C was calculated by applying the formula from (4) as follows: 


0.75x0.5 — 0.7x0.5 — 1x1 —-0.4x1 -0.4x0.75 1.675 


a a a a es ee ee 
0.75x0.5 + 0.7x0.5 + 1x1 +0.4x1+04%0.75 2.425 


- Calculating the closeness of new customer cases and case D. The following Table 15 is an example 
of new case with case C. 


Table 15. Example of new case with case C 
Attribute New Case Old Case Value of Closeness _ Weight of Attribute 


Education Diploma Diploma 1 0.5 

Status Single Married ] 0.5 
Home Credit No No 0.5 1 
Bank Credit No Yes 0.4 1 

Occupation __Enterpreneur__ Civil Servant 0.6 0.75 


The closeness of new case with case D was calculated by applying the formula from (4) as follows: 


1¥0.5 — 1x0.5-0.5x1 -0.4x1-0.6x0.75 135 


_______—__ = —_ = 574 
1x0.5 + 1x0.5+0.5x1 + 0401+ 0.6x0.75 2.35 


From the calculation of the closeness between new cases with cases A, B, C and D, it can be found that the 
greatest closeness value is obtained in case A, then the prediction in case A will be applied, 
Le. new customers to agree or disagree in the bank offer. 
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3.5. Analysis by using formula of a euclidean distance 
- Calculating the closeness of new customer cases and case A. The following Table 16 is an example 
of new case with case A. 


Table 16. Example of new case with case A 
Attribute New Case Old Case Value of Closeness __ Weight of Attribute 


Education Diploma >=Bachelor 0.75 0.5 

Status Single Single ] 0.5 
Home Credit No No 1 1 
Bank Credit No No 1 1 

Occupation _ Enterpreneur___ Enterpreneur 1 0.75 


The closeness of new case with case A was calculated by applying the formula from (4) as follows: 


0.75x0.5 — 0.7x0.5 — 1x1 —1x1—1x0.75 2.75 _ 733 
0.5+05+1+1+0.75 a 


- Calculating the closeness of new customer cases and case B. The following Table 17 is an example 
of new case with case B. 


Table 17. Example of new case with case B 


Attribute New Case Old Case Value of Closeness__ Weight of Attribute 
Education Diploma <=High School 0.5 0.5 
Status Single Married 0.5 0.5 
Home Credit No No 1 1 
Bank Credit No No 1 1 
Occupation _Enterpreneur__ Enterpreneur 1 0.75 


The closeness of new case with case B was calculated by applying the formula from (4) as follows: 


0.5x0.5 —1x0.5—O0.4x1—-—i1x1—1x0.75 2.4 


= —— = 0.640 
0540541414075 3.75 


- Calculating the closeness of new customer cases and case C. The following Table 18 is an example 
of new case with case C. 


Table 18. Example of new case with case C 


Attribute New Case Old Case Value of Closeness Weight of Attribute 
Education Diploma >=Bachelor 0.15 0.5 
Status Single Single 0.7 0.5 
Home Credit No No ] ] 
Bank Credit No Yes 0.4 ] 
Occupation Enterpreneur Private Employees 0.4 0.75 


The closeness of new case with case C was calculated by applying the formula from (4) as follows: 


0.75x0.5 — 0.7x0.5 — 1x1 — 0.4x1 — 0.4x0.75 1.675 


= 0.446 
O.5+0:5+1+140.75 3.75 


- Calculating the closeness of new customer cases and case C. The following Table 19 is an example 
of new case with case D. 


Table 19. Example of new case with case D 


Attribute New Case Old Case Value of Closeness Weight of Attribute 
Education Diploma Diploma ] 0.5 
Status Single Married 1 0.5 
Home Credit No No 0.5 1 
Bank Credit No Yes 0.4 il 
Occupation Enterpreneur Civil Servant 0.6 0.75 
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The closeness of new case with case D was calculated by applying the formula from (4) as follows: 


1x0.5 — 1x0.5 — 0.5x1 —0.4x1—0.6x0.75 1.35 _ nae8 
05+05+1+1+0.75 3.75 - 


From the calculation of the closeness between new cases with cases A, B, C and D, it can be found 


that the greatest closeness value is obtained in case A, then the prediction in case A will be used, namely new 
customers to agree or disagree in the bank offer. 


3.6. The results of optimizing the KNN method with euclidean distance formulas 

From the 18,000 number of old case data, the closeness value with the new data will be calculated 
by applying the euclidean distance formula. The display of the system in Figure 3. From Figure 3 
it is clear that all the data used are close to the value of using the euclidean distance formula with the amount 
of training data of 18,000 old cases which will be calculated by applying the KNN method. From the new 
cases that are inputted, the value of proximity is similar or the value of proximity is | with 8 old cases with 
attributes agree no. It means that the new case data applying the KNN method calculated the value 
of its proximity using the euclidean distance formula as a result with 8 old case data located in the listview 
of the proximity of the KNN within 17 minutes 45 seconds. 


3.7. The results of optimizing the KNN method with hamming distance formulas 

From the 18,000 number of old case data, the closeness value with the new data will be calculated 
by applying the hamming distance formula. The display of the system in Figure 4. From Figure 4 it is clear 
that all the data used are searched forthe value of proximity by applying the hamming distance formula with 
the amount of training data of 18,000 old cases which will be calculated by using the KNN method. From 
the input new cases the value of closeness is similar or the value of closeness is 0.884 with 29 old cases with 
attributes agree no. It means the new case data applying the KNN method, the value of proximity 
is calculated by applying the hamming distance formula, then the results are not worth | (similar) but 
approaching the value of 1 with the value of 0.884 with 29 old case data located in the listview of the KNN 
proximity with 13 mmutesand 13 seconds. 
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Figure 3. The results of optimizing the KNN method with euclidean distance formulas 
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Figure 4. The results of optimizing the KNN method with the hamming distance formula 


3.8. The results of optimizing the KNN Method with manhattan distance formulas 

From the 18,000 number of old case data, the closeness value of the new data will be calculated by 
applying the Manhattan distance formula. The display of the system in Figure 5. From Figure 5 it is clear that 
all the data used, the value of proximity is calculated by applying the Manhattan distance formula with the 
amount of training data of 18,000 old cases which will be calculated by using the KNN method. From 
the input new cases, the value of closeness is similar or the value of closeness is 0.8133 with 29 old cases 
with attributes agree no. It means the new case data using the KNN method, the value of proximity is 
calculated by applying the Manhattan distance formula results with none having a proximity value of | but 
0.8133 with 29 old case data located in the listview of the proximity of the KNN within 16 minutes 
11 seconds. 



































all ANALISIS METODE NEAREST NEIGHBOR _ O x 
KEDEKATAN NEAREST NEIGHBOR 
Kode Nasabah abe 
x No Kode Nama Al a2 A3 As ADS Setuju Nilai  * 
Nama abc 
1 10900 04 10 #410 #10 «10 ~=~ «no 0.8133... 
Alamat abe 2 10615 04 10 #410 #10 «10 ~= «no 0.8133... 
No Telpon/HP ae 3. 10702 04 10 #410 #10 «10 ~= «no 0.8133... 

a - 4 10746 04 10 #410 #10 «10 ~= «no 0.8133... 
Pendidikan (SMA Til 5. 10295 04 10 #410 10 «10 ~=«n0 0.8133... 
Chat sinsle 121 6. 10962 04 10 #410 #10 «10 ~= «no 0.8133... 

; = 7 11727 04 10 #410 #10 «10 ~=~ «no 0.8133... 
Kredit Rumah = 132 8. 13822 04 10 #410 #4210 «210 ~= no 0.8133... 
Rect Hank = [42 9. 13688 04 10 #410 #10 «10 ~=~ «no 0.8133... 
eas =a = 10. 12629 04 10 #410 #10 «10 ~=~«n0 0.8133... 

ee ae 52 11. «11967 04 10 #410 #410 «210 ~=«Ono 0.8133... 
Setuju TIDAK 12. 11885 04 10 #410 #10 «10 ~=~ «no 0.8133... 
tS | SAR as. 13h 183 TSM. ULB == _f £199 ne J 
DATA PENDEKATAN DATA ATTRIBUT _ DATA NASABAH LAMA 
Kode PDKT1 Kode PDKT2 Nilai * ATTRIBUT | BOBOT | |//|[x, | Kode Nas | Nama | Al re i rr ak Seta |C 
ni m1 10 Pendidikan 0.5 11 1 1 r12 q22 132 142 151 no 
Tie ri2 10 Status 0.5 2 10 - mu r21 131 r42 151 no 
— = = Kredit Rumah 10 | |3 100 2 122 132 12 151 no 
aa = = Kredit Bank 1.0 | | 4 1000 113 122 132 142 151 no 
= = Pekerjaan 0.75 /5. 10000 113 123 131 Ts 151 no 
a i | |6. 10001 113 21 132 T42 151 
— = = Il. 10002 a =a a I : — ee 
2 3 0,75 Wi = 5 a = = = = 
m mas = a 
< > | < > 
ee 100% Completed 00:16:11:516 





Figure 5. Results of optimizing the KNN method with manhattan distance formula 


3.9. Discussion 
After testing by 18,000 old case data then new case data is input and the value of the closeness 
between the new case and the old case is calculated by optimizing the distance formula by using 
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the euclidean distance formula with the hamming distance formula and the Manhattan distance formula 
Where each of the formulas is obtained the results applying the euclidian distance formula require 
17 minutes 45 seconds to calculate the value of proximity with the results of the proximity value 
of 1 (similar) to 8 old cases. Whereas by applying the hamming distance formula with the same amount 
of data and the same new case requires 13 minutes and 13 seconds in the system. From the results 
of the calculation the value of proximityis 0.884 which hasa value close to | (similar) with the number of old 
cases as many as 29 cases. Meanwhile, by applying the Manhattan distance formula calculates the value 
of proximity to the same old case data and the new case takes 16 minutes 11 seconds to calculate it 
in the system. But from this calculation there is no closeness value | (similar) but approaching the value 1, 
which is 0.8133 which consists of 29 old case data. From the data above, it can be seen the optimal 
comparison results in the following table 20. Therefore, the optimization should deliver the suitable solution 
to the problem within the context by considering various factors such as total cost, aggregation value, 
maximum loading point, consistency of performance, system losses, category accuracy, feature extracted 
and so on [22-25]. The following Table 20 is result of optimization comparison. 


Table 20. The result of optimization comparison 


, Number of Closeness Number of Data 
ae Distance Formula old case amie Value Closeness Value 
il Euclidean distance 18.000 17Min 45 Sec 1 8 
2 Hamming distance 18.000 13 Min 13 Sec 0.884 29 
3 Manhattan distance 18.000 16 Min 11 Sec 0.8133 29 


4. CONCLUSION 

By applying the euclidean distance formula the closeness value is | (similar) with 8 the number 
of old case data with the processing time on the system with the number of old case data is 18,000 requires 
17 minutes 45 seconds. With the distance hamming formula there is no | (similar) value of closeness but 
0.884 with 29 the number of old case data with the process time on the system with the 18,000 old case data 
requires 13 mimutes 13 seconds. With the Manhattan distance formula there is no 1 (similar) value 
of closeness but 0.8113 with 29 the number of old case data with the process time on the system with the 
18,000 old case data requiring 16 minutes 11 seconds. 
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